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There has been a surge in biomedical imaging technologies with the recent 
advancement of deep learning. It is being used for diagnosis from 
X-ray, computed tomography (CT) scan, electrocardiogram (ECG), and 
electroencephalography (EEG) images. However, most of them are solely 
for particular disease detection. In this research, a computer-aided deep 
learning model named COVID-CXDNetV2 has been presented to detect two 
separate diseases, coronavirus disease 2019 (COVID-19) and pneumonia, 
from the X-ray images in real-time. The proposed model is made based on 
you only look once (YOLOv2) with residual neural network (ResNet) and 
trained by a vast X-ray images dataset containing 3788 samples of three 
classes named COVID-19 pneumonia and normal. The model has obtained 
the maximum overall classification accuracy of 97.9% with a loss of 0.052 
for multiclass classification (COVID-19, pneumonia, and normal) and 
99.8% accuracy, 99.52% sensitivity, 100% specificity with a loss of 0.001 
for binary classification (COVID-19 and normal), which beats some current 
state-of-the-art results. Authors believe that this method will be applicable in 
the medical domain for the diagnosis and will significantly contribute to real 
life. 
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1. INTRODUCTION 


Coronaviruses (CoVs) are a larger family of harmful viruses that can affect humans and other 
animals and even can cause death. In the 21st century, two widely zoonotic CoVs, Middle East respiratory 
syndrome coronavirus (MERS-CoV) and severe acute respiratory syndrome coronavirus (SARS-CoV), 
spread from animal reservoirs to cause global pandemics with alarming morbidity and mortality [1]. 
Recently, coronavirus disease 2019 (COVID-19), which is owing to severe acute respiratory syndrome 
coronavirus 2 (SARS-CoV-2), has been emerged in Wuhan, China, in December 2019 [2]. It has spread out 
all over the world and affected 307 M people, and results in 5.5 M deaths all over the world till Jan 09, 2022 
[3]. COVID-19 epidemic was certified as a global pandemic on March 11, 2020, by the World Health 


Organization [4]. 
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This virus is contagious and can be transmitted from close contact with affected people or even from 
sharing ordinary staff [5]. So, isolation is a standard solution to control the spreading of this disease. Even 
though the reverse transcription-polymerase chain reaction (RT-PCR) is presently the mainstream procedure 
for detecting COVID-19 disease [6], this method has some limitations like requires special test kits, costly, 
time-consuming and results are highly false-negative levels and considered in the context of recent exposures 
to the patient and the existence of clinical signs and symptoms [7]. Moreover, this method has a low 
sensitivity of 60 to 71% for the COVID-19 identification due to a low viral load present in the test specimen 
and the laboratory error [8]. Pneumonia is also a lung infection similar syndrome as COVID-19 that causes 
viruses bacteria, and fungi. From mild to life-threatening, pneumonia can range in seriousness. It is highly 
hazardous for older people (age >65), kids and young children, and people with health issues or weaker 
immune systems. 

Medical image processing is another feasible technique for COVID-19 or pneumonia detection from 
chest X-ray (CXR) or computed tomography (CT) images. Thoracic abnormality in CT images shows high 
sensitivity, so a lot of researchers are focusing on CT images to detect COVID-19 disease [9]. But there are 
some drawbacks, including the limitation of portability, required deep cleaning of the apparatus used, the 
high value of radiation, and the higher cost [10]. On the other hand, CXR is very common, cost-effective, 
portable, available in almost any diagnostic center, and easily accessible [11]. Therefore, CXR based method, 
which can easily determine lung abnormalities, can be a good alternative tool to diagnose COVID-19 and 
pneumonia. 

In recent times, the deep learning technique has led to medical imaging study. Image classification, 
segmentation, and pattern finding are the most common task which is handled efficiently by convolutional 
neural networks (CNNs). In the meantime, this technique has proven successful in detecting bleeding, breast 
cancer, pneumonia, skin cancer, arrhythmia, diabetic retinopathy, brain disease, and so on. The fast transmission 
of COVID-19 disease required more radiologists in this field to support the diagnostic centers, which is almost 
impossible within a short span of time. The proposed deep learning technique of COVID-19 and pneumonia 
detection can help in this aspect and reduce the cost of the test kit, technologists, and other logistics. 

To handle the rapidly growing number of COVID-19 and pneumonia cases, the researchers are 
usually using CT and X-rays images. In their research [12], presented a deep learning model named the 
DarkCovidNet on the basis of the darknet object detection method, which can determine corona from X-ray 
images. This article reported a classification accuracy of 87.02 % for a three-category classification problem, 
including COVID-19, normal, and pneumonia. Heidari et al. [13] have derived a computer-aided diagnosis 
scheme from CNN to diagnose the COVID-19 infected pneumonia, which has shown overall accuracy of 
94.5 % in a three-class disease classification problem. Similarly, researchers have used DenseNet-121 in 
COVID-CXNet to classify corona from X-ray images [14]. The fusion of two models can be used for 
increasing the accuracy of the deep learning-based model. For example, Rahimzadeh and Attar [15] have 
fused the Xception, ResNet50V2, and the neural network, which has achieved 91.4% accuracy. Khan et al. 
[16] have derived a deep CNN model from the Xception model named CoroNet, which has gained 95% 
accuracy. Also, the transfer learning technique is the best way of medical image classification, especially if 
the number of samples in the dataset is less [17]. For example, Abbas et al. [18] have applied a transfer 
learning approach and achieved an accuracy of 95.12% to detect COVID-19 using the CXR images. 

This research proposes a novel deep learning model named COVID-CXDNetV2 based on the 
modification of you only look once (YOLOv2) [19] with ResNet [20] for detecting COVID-19 and 
pneumonia patients as a multiclass classification problem. In addition, a customized CXR images dataset was 
also formed to train the model by collecting data from four different open-source repositories. The dataset 
contains 1,102 COVID-19 positive, 1,341 normal, and 1,345 infected viral pneumonia CXR images. The 
model needs raw CXR images as an input and gives COVID-19, pneumonia, or normal as an output. The 
main limitation of CXR based COVID-19 and pneumonia detection is that it cannot usually detect accurately 
in a very early stage of COVID-19 or pneumonia as it does not have high sensitivity in detecting the ground- 
glass opacities [7]. 

The paper is organized into three sections: in section 2, the method of this research is presented, 
which includes dataset, dataset preprocessing, and proposed model architecture. The performance evaluation 
of the model and comparison of other existing state-of-the-art models in this field are discussed in section 3. 
Finally, the conclusion and future direction are provided in section 4. 


2. METHOD 

A novel deep learning model is introduced in this study to automatically identify confirmed 
COVID-19 patients from viral pneumonia and normal patients using 2D traditional CXR images. An open 
access customized dataset is also developed to encourage training and evaluation of the proposed model. The 
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complete operation of the proposed framework is represented in Figure 1. The Figure 1 shows that first 


collect dataset from different authentic sources and preprocess data to fit the proposed model and apply to the 
deep learning model to identify diseases from the CXR images. 
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Figure 1. Flow diagram of the proposed method 


2.1. Dataset 

Small datasets are one of the challenges of medical image processing techniques. In this research, 
the CXR images were applied to detect COVID-19. However, there are just a few publicly available 
COVID-19 CXR images because COVID-19 is a recent disease. This analysis uses and assembles a dataset 
of CXR images taken from four publicly accessible medical repositories which are shown in Table 1. 

The customized dataset is generated by filtering four datasets with a posteroanterior (PA) view of 
X-ray images and avoiding data leakage. After assembled, 3788 2D CXR images are considered for affected 
COVID-19 detection; among them, 1102 images are positive COVID-19, 1,345 images are infected viral 
pneumonia, and 1,345 images are normal cases. The customized dataset is publicly available on Kaggle [21]. 


Table 1. Summary of dataset collection 


Name Pneumonia COVID-19 Normal 
Dataset_1[22] 0 59 0 
Dataset_2[23] 0 53 0 
Dataset_3[21] 1345 219 1341 
Dataset_4[14] 0 818 0 


2.2. Dataset preprocessing 

The dataset is collected from four different sources, so the CXR size of the images varies from 
205x243 to 3804x3487 pixels. During training, all images are resized to 256x256 pixels. Images are also 
normalized to prevent more computation time and reduce random access memory (RAM) usage. In the case 
of deep learning algorithms, data augmentation is a strategy to enhance the dataset size for real-time practical 
application. Due to the small number of data sets, data augmentation is a useful medical imaging technique. 
Data augmentation is a technique to generate copies of one image into multiple possibilities of geometric 
transformations, blur, luminance, flipping, noise injection, color modification, cropping, and rotation 
variances of the image, for effective and generalized training. In this experiment, the values of parameters 
i.e., p_lighting: 0.75, p_affine: 0.75, maximum zoom: 1.1, maximum warp: 0.2, maximum lighting: 0.2 are 
used for the CXR image transformation. But, due to the location of the heart, lungs, and other organs, the 
flipping of the CXR image is ignored. The technique also helps reduce overfitting when training the model. 
After data augmentation, the dataset is sliced into two parts, 80% is used for training, and 20% is utilized for 
validation tests, and 80% for the training of a total number of CXR images. 


2.3. Model architecture 

In this study, the proposed model was developed using a deep learning approach. This approach is a 
subfield of artificial intelligence (AI) that trains to learn models using artificial neural networks (ANNs). 
There are several kinds of deep learning algorithms, but CNN are the most broadly used. YOLO is one of the 
well-known deep learning models that utilize CNN architecture for real-time object detection tasks. This 
technique utilizes only one neural network to the entire image. Then splits it into sections and identifies the 
bounding boxes and probabilities of each part. The estimated probabilities are used to weigh these boxes. The 
proposed COVID-CXDNetV2 model architecture was designed by modifying YOLOv?2 inspired by Ozturk et 
al. [12] and residual neural network (ResNet) to detect COVID-19 patients from pneumonia and normal CXR 
images. Residual network connections are shortcut or skip connection that takes activations from one layer 
and feed it to another layer. According to the connection, the weight layer sequence output is the sum of the 
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present weight and past activation (the main input) and then crosses over a non-linear activation function. 
There are residual blocks in ResNet that help in training deeper networks. The advantage of training a 
residual network is that the training error does not rise when training a deeper network. The residual network 
connections are shown in Figure 2. 


x 
identity 


Figure 2. Residual connection 


The proposed COVID-CXDNetV2 architecture is presented in Figure 3 and marked in several colors 
to make it easier to understand. The proposed model consists of 14 convolution blocks, five max-pooling 
layers, four residual connections, one convolution layer, one flatten layer, and one linear layer with various 
filter numbers, sizes, and stride sizes. The dimension of the input color image is 256x256x3, which indicates 
the CXR image with 256 height and 256 widths with three channels (green, red, and blue). The first 
convolutional block takes the image that has eight kernels of size 3x3 with padding one and stride one. The 
convolution block has a single convolution layer with batch normalization and the leaky rectified linear unit 
(LReLV) activation function. The convolution layer is a significant aspect of the CNN model framework that 
utilizes the convolution operation (*) rather than regular matrix multiplication. It involves a set of learnable 
filters for detecting features in the input image. If I is the input image, kernel K, mxn is the kernel size, and S 
is the output, the 2D convolution process is given by (1). 


SG) = Uk ij =XmyYnI(m,n)kG -—m,j —n) (1) 
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Figure 3. Architecture of COVID-CXDNetV2 


Batch normalization is used to normalize the input image, minimize training time, and improve 
model robustness. An LReLU non-linear activation function is used in that the value is a small fraction in the 
negative section of their derivatives. Then a max-pooling layer with a stride of 2 and a 2x2 filter is used. 
After the process, the output shape of image 128x128x8 is utilized in the second convolution block, which 
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contains 16 filters. In Figure 3, the third block has one max pooling layer with a stride of 2, and a 2x2 filter is 
utilized after the second convolution block, and the output shape of the image downsizes to 64x64x16. The 
block also includes three convolutional blocks with kernels 32, 16 and 32 of size 3x3, 1x1, and 3x3, 
respectively, as well as one residual block. This process continues for four, five, and six blocks in the figure. 
After that, a single convolution layer with two kernels of size 3x3/1, a flatten layer, and a linear layer was 
used. The linear layer has 3 or 1 neurons to classify COVID-19 or pneumonia or normal CXR images 
(3 neurons for pneumonia, normal and COVID-19; 1 neuron for COVID-19 or normal). All layers and their 
parameters and output shape of the proposed architecture are shown in Table 2. 


Table 2. The architectural summary of the proposed model 


Type Filters Size/Stride Output 
Convolutional 8 3x3/1 256x256 
Maxpool 8 2x2/2 128x128 
Convolutional 16 3x3/1 128x128 
Maxpool 16 2x2/2 64x64 
Convolutional 32 3x3/1 64x64 
Convolutional 16 1x1/1 66x66 
Convolutional 32 3x3/1 66x66 
Residual 66x66 
Maxpool 32 2x2/2 33x33 
Convolutional 64 3x3/1 33x33 
Convolutional 32 1x1/1 35x35 
Convolutional 64 3x3/1 35x35 
Residual 35x35 
Maxpool 64 2x2/2 17x17 
Convolutional 128 3x3/1 17x17 
Convolutional 64 1x1/1 19x19 
Convolutional 128 3x3/1 19x19 
Residual 19x19 
Maxpool 128 2x2/2 9x9 
Convolutional 256 3x3/1 9x9 
Convolutional 128 1x1/1 11x11 
Convolutional 256 3x3/1 11x11 
Residual 11x11 
Convolutional 2 3x3/1 11x11 
Flatten 242 
Linear 3 or 1 


The proposed model is implemented using PyTorch 1.4 open-source machine learning framework. 
The network architecture is trained by utilizing the Adam optimizer with 3e-3 learning rate, 32 batch sizes, 
and the maximum number of epochs 100. Adam optimizer converts the learning rate and attributes weight to 
decrease the loss of the learning network architecture. The ‘cross-entropy' loss function is used to evaluate 
the performance of a classification model since it is utilized to solve the classification problem. All the 
computational and analysis are done on the Google Colaboratory platform with a Tesla T4 GPU. 


3. EVALUATION AND DISCUSSION 
The performance of the COVID-CXDNetV2 model is evaluated using six different metrics: 
accuracy, sensitivity, specificity, precision, and F1 score [24]-[26]. They are defined as (2) to (6). 


YTP+ETN 


Accuracy = ST crn EE (2) 
Sensitivity = a (3) 
Specificity = a (4) 
Precision = Ecce (5) 
F1 score = eT (6) 
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In this study, two experiments (one for the multiclass classification and another for the binary 
classification) are performed to judge the efficiency and performance of the proposed COVID-CXDNetV2 
model to classify COVID-19 or pneumonia from CXR images. The confusion matrix for multiclass and 
binary classification of the model are included in Figure 4. 

According to (2)-(6), the COVID-CXDNetV2 model obtained an overall classification accuracy of 
97.33% for multiclass disease classification. For COVID-19 detection the model achieved 99.13% 
sensitivity, 99.81% specificity, 99.56% precision and 99.34% F1 score. Similarly, the performance for 
detecting only pneumonia diseases is shown in Table 3. The model also obtained 99.79% classification 
accuracy, 99.52% sensitivity, 100% specificity, 100% precision, and 99.76% FI score for binary 
classification (normal and COVID-19). 
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Figure 4. Confusion matrix for multiclass classification and binary classification respectively 


Table 3. The performance of the model for each disease 


Class Precision Sensitivity Specificity F1 score 
COVID-19 99.56% 99.13% 99.81% 99.34% 
Normal 95.11% 98.44% 96.68% 96.75% 
Pneumonia 97.64% 94.66% 98.77% 96.13% 


The accuracy, validation loss and training loss of the COVID-CXDNetV2 has been evaluated 
through a different number of epochs (20, 40, 60, 80, 100), which is shown in Table 4. The model may 
overfit if the number of epochs is very high, and the training accuracy will reach 100%. If validation loss > 
training loss, the model is overfitting. In opposite, the model is underfitting. For a good fit of the model, the 
difference between validation loss and training loss should be minimum. The best results were obtained using 
that approach at epoch 80 for multiclass classification and 100 for binary classification, where the validation 
and training loss is minimum. In both multiclass and binary classification, the training and validation loss 
decreases as the number of epochs increases which are shown in Figure 5. The initial position of the loss 
graph is highly distorted between training and validation losses. After a few epochs, the training and 
validation losses are approximately equal. In the last position of the graph, the validation loss increases 
gradually. So, the model obtained maximum overall classification accuracy of 97.9% at 80 epochs for 
multiclass and 99.8% at 100 epochs for binary classification where the losses and difference between training 
and validation loss are minimum. 


Table 4. Performance evaluation of multiclass and binary classification for different epochs 


Multiclass classification Binary class classification 
Epoch Accuracy (%) Training loss_ Validation loss Accuracy (%) Training loss Validation loss 
20 92.5 0.212981 0.202916 97.9 0.067623 0.067721 
40 94.9 0.141943 0.153560 99.5 0.033901 0.023971 
60 96.9 0.083831 0.088530 99.8 0.018865 0.013667 
80 97.9 0.052176 0.070637 99.8 0.007013 0.015052 
100 97.3 0.034904 0.085641 99.8 0.001320 0.014740 
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A lot of researchers are conducting research for COVID-19 or pneumonia diseases detection from 
CXR images. Ozturk et al. [12] presented DarkCovidNet to identify COVID-19 cases from CXR images and 
have obtained an accuracy of 87.02% for multiclass classification and 98.08% accuracy for binary 
classification. Another deep learning model [15] is developed based on Xception + ResNet50V2 obtained an 
accuracy of 91.4% for three classes. Apostolopoulos and Mpesiana [27] achieved a classification accuracy of 
93.48% and 93.5% for using the VGG19 and MobileNetV2 model, respectively, for three-class classification 
cases. A deep learning network, CoroNet is used [16] to detect corona from CXR images and achieved 95% 
accuracy for multiclass classification. However, the result of the proposed COVID-CXDNetV2 model is 
superior compared to other existing researches, which is shown in Table 5. The table only included the 
research that analyzed the performance of COVID-19 detection from CXR images. 

CXR images are recommended because they are easily accessible for disease identification. During 
the pandemic situation, they are frequently utilized in health clinics throughout the world. As the 
performances of the proposed COVID-CXDNetV2 are comparatively more efficient than the other existing 
work, it can be used for diagnosing COVID-19 in an easy manner. 


Train Loss Train Loss 


Validation Loss Validation Loss 


Loss 


0 10 20 30 40 50 60 70 80 90 
Epoch 


Figure 5. Loss graph for multiclass classification (left) and binary classification (right) 


Table 5. Performance comparison of the COVID-CXDNetV2 model with other existing models 


Multiclass classification Binary class classification 


Approach Method Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity 
(%) (%) (%) (%) (%) (%) 
Sethy et al. [28] Resnet50+ SVM - - - 95.38 - - 
Hemdan et al. [29] COVIDX-Net - - - 90.0 - - 
Gunraj et al. [30] COVID-Net 93.3 - 91.0 96.6 - 91.0 
Apostolopoulos MobileNetV2 93.5 98.6 - 96.7 98.6 - 
and Mpesiana [27] 
Narin et al. [31] ResNet50 - - - 98.0 96.0 - 
Ozturk et al. [12] DarkCovidNet 87.02 - - 98.08 - - 
Rahimzadeh and Xception + 91.4 - 80.53 - - - 
Attar [15] ResNet50V2 
Heidari et al. [13] VGG16 94.5 98.4 - 98.1 98.4 - 
Apostolopoulos VGG19 93.48 92.85 98.75 98.75 92.85 98.75 
and Mpesiana [27] 
Khan et al. [16] CoroNet 95 - 95.0 98.8 - 95.0 
Rahimzadeh and Xception + - - - 99.5 - 80.53 
Attar [15] ResNet50V2 
Abbas et al. [18] DeTraC Deep CNN 95.12 97.91 91.87 - - - 
Proposed Model COVID-CXDNetV2 97.9 99.13 99.81 99.79 99.52 100 


4. CONCLUSION AND FUTURE WORK 


COVID-19 has become a life-threatening disease worldwide, and a lot of researchers are conducting 
research to detect this disease. Computer vision-based recognition is one of the most prominent ways to 
detect COVID-19 disease from X-ray images. Pneumonia is a lung disease with very similar symptoms and 
can also be detected from X-ray images. This research proposed a deep learning-based COVID-19 and 
pneumonia diseases detection model that recognizes the disease as a multiclass and binary classification 
problem from X-ray images. The proposed model builds by modifying YOLOv2 with ResNet. A vast 
customized X-ray dataset of 3788 2D posteroanterior (PA) X-ray images has been used for training the 
model. Among those X-rays, 1,102 images are labeled as COVID-19 positive, 1,345 images are labeled 
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pneumonia positive, and 1,345 are labeled normal, meaning those are both COVID-19 and pneumonia 
negative. Eventually, the model has been compared with other existing computer vision-based COVID-19 
diseases detection and the pneumonia detection results, and this model achieved height accuracy, sensitivity, 
and specificity. In addition, the training loss recorded was 0.052 for multiclass classification cases and 0.001 
for binary classification cases during the training process. The accuracy of this model has beat the 
performance of the state of art results of this research area. The accuracy of this model largely depends on the 
dataset. Increasing the number of samples by collecting more images or by image synthesis and augmentation 
may help increase the proposed model's accuracy. 
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